Goto

Collaborating Authors

 lemma 4


Learning Personalized Ad Impact via Contextual Reinforcement Learning under Delayed Rewards

Neural Information Processing Systems

Online advertising platforms use automated auctions to connect advertisers with potential customers, requiring effective bidding strategies to maximize profits. Accurate ad impact estimation requires considering three key factors: delayed and long-term effects, cumulative ad impacts such as reinforcement or fatigue, and customer heterogeneity. However, these effects are often not jointly addressed in previous studies. To capture these factors, we model ad bidding as a Contextual Markov Decision Process (CMDP) with delayed Poisson rewards. For efficient estimation, we propose a two-stage maximum likelihood estimator combined with data-splitting strategies, ensuring controlled estimation error based on the first-stage estimator's (in)accuracy. Building on this, we design a reinforcement learning algorithm to derive efficient personalized bidding strategies. This approach achieves a near-optimal regret bound of O(dH2 T), where d is the contextual dimension, H is the number of rounds, and T is the number of customers. Our theoretical findings are validated by simulation experiments.


Procurement Auctions with Predictions: Improved Frugality for Facility Location

Neural Information Processing Systems

We study the problem of designing procurement auctions for the strategic uncapacitated facility location problem: a company needs to procure a set of facility locations in order to serve its customers and each facility location is owned by a strategic agent. Each owner has a private cost for providing access to their facility (e.g., renting it or selling it to the company) and needs to be compensated accordingly. The goal is to design truthful auctions that decide which facilities the company should procure and how much to pay the corresponding owners, aiming to minimize the total cost, i.e., the monetary cost paid to the owners and the connection cost suffered by the customers (their distance to the nearest facility). We evaluate the performance of these auctions using the frugality ratio. We first analyze the performance of the classic VCG auction in this context and prove that its frugality ratio is exactly 3. We then leverage the learning-augmented framework and design auctions that are augmented with predictions regarding the owners' private costs. Specifically, we propose a family of learning-augmented auctions that achieve significant payment reductions when the predictions are accurate, leading to much better frugality ratios. At the same time, we demonstrate that these auctions remain robust even if the predictions are arbitrarily inaccurate, and maintain reasonable frugality ratios even under adversarially chosen predictions. We finally provide a family of "error-tolerant" auctions that maintain improved frugality ratios even if the predictions are only approximately accurate, and we provide upper bounds on their frugality ratio as a function of the prediction error.


Discretization-free Multicalibration through Loss Minimization over Tree Ensembles

Neural Information Processing Systems

In recent years, multicalibration has emerged as a desirable learning objective for ensuring that a predictor is calibrated across a rich collection of overlapping subpopulations. Existing approaches typically achieve multicalibration by discretizing the predictor's output space and iteratively adjusting its output values. However, this discretization approach departs from the standard empirical risk minimization (ERM) pipeline, introduces rounding error and an additional sensitive hyperparameter, and may distort the predictor's outputs in ways that hinder downstream decision-making. In this work, we propose a discretization-free multicalibration method that directly optimizes an empirical risk objective over an ensemble of depth-two decision trees. Our ERM approach can be implemented using off-the-shelf tree ensemble learning methods such as LightGBM. Our algorithm provably achieves multicalibration, provided that the data distribution satisfies a technical condition we term as loss saturation. Across multiple datasets, our empirical evaluation shows that this condition is always met in practice. Our discretization-free algorithm consistently matches or outperforms existing multicalibration approaches-- even when evaluated using a discretization-based multicalibration metric that shares its discretization granularity with the baselines. Code to replicate the results in this work is available at https://github.com/hjenryin/


acb3e20075b0a2dfa3565f06681578e5-Paper-Conference.pdf

Neural Information Processing Systems

This paper investigates convex-concave minimax optimization problems where only the function value access is allowed. We introduce a class of Hessianaware quantum zeroth-order methods that can find the ǫ-saddle point within O(d2/3ǫ 2/3) function value oracle calls. This represents an improvement of d1/3ǫ 1/3 over the O(dǫ 1) upper bound of classical zeroth-order methods, where d denotes the problem dimension. We extend these results to µ-stronglyconvex µ-strongly-concave minimax problems using a restart strategy, and show a speedup of d1/3µ 1/3 compared to classical zeroth-order methods. The acceleration achieved by our methods stems from the construction of efficient quantum estimators for the Hessian and the subsequent design of efficient Hessian-aware algorithms. In addition, we apply such ideas to non-convex optimization, leading to a reduction in the query complexity compared to classical methods.


Follow-the-Perturbed-Leader Nearly Achieves Best-of-Both-Worlds for the m-Set Semi-Bandit Problems

Neural Information Processing Systems

We consider a common case of the combinatorial semi-bandit problem, the m-set semi-bandit, where the learner exactly selects m arms from the total d arms. In the adversarial setting, the best regret bound, known to be O( nmd) for time horizon n, is achieved by the well-known Follow-the-Regularized-Leader (FTRL) policy. However, this requires to explicitly compute the arm-selection probabilities via optimizing problems at each time step and sample according to them. This problem can be avoided by the Follow-the-Perturbed-Leader (FTPL) policy, which simply pulls the m arms that rank among the m smallest (estimated) loss with random perturbation. In this paper, we show that FTPL with a Fréchet perturbation also enjoys the near optimal regret bound O( nm( p dlog(d) + m5/6)) in the adversarial setting and approaches best-of-both-world regret bounds, i.e., achieves a logarithmic regret for the stochastic setting. Moreover, our lower bounds show that the extra factors are unavoidable with our approach; any improvement would require a fundamentally different and more challenging method.


Optimal Adjustment Sets for Nonparametric Estimation of Weighted Controlled Direct Effect

Neural Information Processing Systems

The weighted controlled direct effect (WCDE) generalizes the standard controlled direct effect (CDE) by averaging over the mediator distribution, providing a robust estimate when treatment effects vary across mediator levels. This makes the WCDE especially relevant in fairness analysis, where it isolates the direct effect of an exposure on an outcome, independent of mediating pathways. This work establishes three fundamental advances for WCDE in observational studies: First, we establish necessary and sufficient conditions for the identifiability of the WCDE, clarifying when it diverges from the CDE. Next, we consider nonparametric estimation of the WCDE and derive its influence function, focusing on the class of regular and asymptotically linear estimators. Lastly, we characterize the optimal covariate adjustment set that minimizes the asymptotic variance, demonstrating how mediator-confounder interactions introduce distinct requirements compared to average treatment effect (ATE) estimation. Using synthetic and real-world data, we validate our theory numerically, showing that the proposed optimal valid adjustment set yields the lowest variance at practical sample sizes. Our results offer a principled framework for efficient estimation of direct effects in complex causal systems, with practical applications in fairness and mediation analysis.


Constrained Feedback Learning for Non-Stationary Multi-Armed Bandits

Neural Information Processing Systems

Non-stationary multi-armed bandits (NSMAB) enable agents to adapt to changing environments by incorporating mechanisms to detect and respond to shifts in reward distributions, making them well-suited for dynamic settings. However, existing approaches typically assume that reward feedback is available at every round--an assumption that overlooks many real-world scenarios where feedback is limited. In this paper, we take a significant step forward by introducing a new model of constrained feedback in non-stationary multi-armed bandits (CONFEE-NSMAB), where the availability of reward feedback is restricted. We propose the first priorfree algorithm--that is, one that does not require prior knowledge of the degree of non-stationarity--that achieves near-optimal dynamic regret in this setting. Specifically, our algorithm attains a dynamic regret of O(K1/3V1/3TT/B1/3), where T is the number of rounds, K is the number of arms, B is the query budget, and VT is the variation budget capturing the degree of non-stationarity.


Generalization Bounds for Kolmogorov-Arnold Networks (KANs)and Enhanced KANs with Lower Lipschitz Complexity

Neural Information Processing Systems

Kolmogorov-Arnold Networks (KANs) have demonstrated remarkable expressive capacity and predictive power in symbolic learning. However, existing generalization errors of KANs primarily focus on approximation errors while neglecting estimation errors, leading to a suboptimal bias-variance trade-off and poor generalization performance. Meanwhile, the unclear generalization mechanism hinders the design of more effective KANs. As the authors of KANs highlighted, they "would like to explore ways to restrict KANs' hypothesis space so that they can achieve good performance." To address these challenges, we explore the generalization mechanism of KANs and design more effective KANs with lower model complexity and better generalization. We define Lipschitz complexity as the first structural measure for deep functions represented by KANs and derive novel generalization bounds based on Lipschitz complexity, establishing a theoretical foundation for understanding their generalization behavior. To reduce Lipschitz complexity and boost the generalization mechanism of KANs, we propose Lipschitz-Enhanced KANs (LipKANs) by integrating the Lip layers and pioneering the L1.5-regularization, contributing to tighter generalization bounds. Empirical experiments validate that the proposed LipKANs enhance the generalization mechanism of KANs when modeling complex distributions. We hope our theoretical insights and proposed LipKANs lay a foundation for the future development of KANs.


Optimal Rates in Continual Linear Regression via Increasing Regularization

Neural Information Processing Systems

We study realizable continual linear regression under random task orderings, a common setting for developing continual learning theory. In this setup, the worstcase expected loss after k learning iterations admits a lower bound of Ω(1/k). However, prior work using an unregularized scheme has only established an upper bound of O(1/k1/4), leaving a significant gap. Our paper proves that this gap can be narrowed, or even closed, using two frequently used regularization schemes: (1) explicit isotropic ℓ2 regularization, and (2) implicit regularization via finite step budgets. We show that these approaches, which are used in practice to mitigate forgetting, reduce to stochastic gradient descent (SGD) on carefully defined surrogate losses. Through this lens, we identify a fixed regularization strength that yields a near-optimal rate of O(logk/k). Moreover, formalizing and analyzing a generalized variant of SGD for time-varying functions, we derive an increasing regularization strength schedule that provably achieves an optimal rate of O(1/k). This suggests that schedules that increase the regularization coefficient or decrease the number of steps per task are beneficial, at least in the worst case.


Integration Matters for Learning PDEs with Backward SDEs

Neural Information Processing Systems

Backward stochastic differential equation (BSDE)-based deep learning methods provide an alternative to Physics-Informed Neural Networks (PINNs) for solving high-dimensional partial differential equations (PDEs), offering potential algorithmic advantages in settings such as stochastic optimal control, where the PDEs of interest are tied to an underlying dynamical system. However, standard BSDEbased solvers have empirically been shown to underperform relative to PINNs in the literature. In this paper, we identify the root cause of this performance gap as a discretization bias introduced by the standard Euler-Maruyama (EM) integration scheme applied to one-step self-consistency BSDE losses, which shifts the optimization landscape off target. We find that this bias cannot be satisfactorily addressed through finer step-sizes or multi-step self-consistency losses. To properly handle this issue, we propose a Stratonovich-based BSDE formulation, which we implement with stochastic Heun integration. We show that our proposed approach completely eliminates the bias issues faced by EM integration. Furthermore, our empirical results show that our Heun-based BSDE method consistently outperforms EM-based variants and achieves competitive results with PINNs across multiple high-dimensional benchmarks. Our findings highlight the critical role of integration schemes in BSDE-based PDE solvers, an algorithmic detail that has received little attention thus far in the literature.